A method of operating a computer to identif/ all chemical structures defined 
by a Markush type formula (200, 220, 260), which is stored in a database 
matching a given query structure (200), without the necessity of generating 
said chemical structures, comprising the steps of: 

(i) Processing said Markush type formula(e) and said 
query(ies) into a computer readable form (210), 

(ii) Searching for partially relaxed subgraph 
isomorphism(s) for each said query (230, 240, 250), 

(iii) Retrieving data (270). 

The method of claim 1, wherein said database is made of at least one 
combinatorial library stored as a Markush type formula (200). 

The method of claim 2, wherein said libraries are each made of one scaflbid 
and at least one R-group as constituents. 

The method of claim 1 , wherein said given query structure is either an exact 
chemical structure or a chemical substructure. 

The method of claim 1, wherein said given query structure is said to match 
said chemical structure if said given query structure is exactly said chemical 
structure. 

The method of claim 1, wherein said given query structure is said to match 
said chemical structure if said given query structure is either said chemical 
^structure or either a substructure of said chemical structure. 

The method of claim 1 , wherein said identification can be performed with said 
query structure as sole input (200), without the requirement of additional 
information to perform said identification. 

The method of claim 1, wherein said generation of chemical structures is 
neither required before nor during the search. 

The method of claim 1, wherein said processing of said step (i) can either be 
performed before or either during said identification. 

The method of claim 1, wherein said Markush type formula(e) can either be 
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pre-processed (210) or processed during said identification. 

11. The method of claim 1, wherein said query(ies) is(are) stored or not in a 
database. 

12. The method of claim 3, wherein said processing of said step (i) comprises the 
5 steps of: 

(a) Building of graphs and binary description of said 
scaffolds and R-groups, 

(b) Building of graph and binary description of said 
query{ies). 

10 13. The method of claim 12, wherein said binary description of said step (a) 

contains or consists of the following Information: 

1 . For each scaffold: 

(a) Number of atoms present in said scaffold, 

(b) Graph of said scaffold, 
15 (c) Number of R-groups, 

(d) Label of said R-^roups, 

(e) Position of said R-groups in said graph, 

(f) Number of neighbours for each R-group and 
position of said neighbours in said graph, 

20 2. For each R-group: 

(a) R-group identification (ID), 

(b) Number of atoms present in said R-group, 

(c) Graph of said R-group, 

(d) Number of attachment points in said R-group, 
25 (e) Attachment points identification (atoms indexing), 

(f) Atoms involved in said attachment points. 

14. The method of claim 3, wherein said partially relaxed subgraph isomorphism 
searching of said step (ii) of said claim 1 (240) is performed on all said 
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libraries and comprises the steps of: 

(a) Scaffold reading (300), 

(b) Partially relaxed subgraph Isomorphism searching of 
said query against said scaffold (310), 

5 (c) Processing of all isomorphisms (320 to 390), 

for each library of said database (220, 260). 

15. The method of claim 14, wherein said processing of said step (c) comprises 
the step of: 

(1) Counting the number of atoms of said query associated 
10 with each constituent of said library (330), 

(2) Identifying which atoms of said query are associated 
With said constituent(s) (330), 

(3) Identifying on which constituent(s) said query is located 
(330). 

15 (4) Processing of said isomorphism taking into account 

said query location of said step (3) (340 to 380), 

for each isomorphism detected. 

16. The method of claim 15, wherein said step (3) defines the global localisation 
of said query on said library constituent(s) as being either only the scaffold 

20 (340), or either only one single R-group (350) or either the scaffold and at 

least one R-group (350). 



17. The method of claim 15, wherein said processing of said step (4) comprises 
the steps of: 

(i) Processing of said isomorphism if said query is only 
25 located on the scaffold of said library (370), 

(ii) Processing of said isomorphism if said query is only 
located on a single R-group of said library (380), 

(iii) Processing of said isomorphism if said query is located 
on the scaffold and at least one R-group of said library 

30 (360 = all other cases). 
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The method of claim 17, wherein said processing of said step (i) (370) 
comprises the step of storing said chemical structures of claim 1 matching the 
query as a sub-library identical to said library (400). 

The method of claim 17, wherein said processing of said step (ii) (380) 
comprises the steps of: 

(a) Identifying members of said single R-group containing 
said query (500, 510, 530, 700 to 730), 

(b) Flagging said members (520). 

The method of claim 19, wherein said chemical structures of claim 1 
matching the query are stored as a sub-library corresponding to a Markush 
type formula made of said scaffold of claim 14, all members of R-^roups not 
associated to said query and said flagged members of said single R-group 
identified by said query in said step (a) of claim 19 (550), if said single R- 
group has at least one member flagged (540). 

The method of claim 17, wherein said processing of said step (Hi) (360) 
comprises the steps of: 

(a) Identifying if atoms of said query are associated with 
an R-group (610), 

(b) Isomorphism searching (640, 700 to 730) of the sub- 
query (620) formed by said atoms, on each member 
(630, 660) of said associated R-group, if at least one 
atom is associated to said R-group (610), 

(c) Flagging each member of said associated R-group for 
which at least one isomorphism is detected (650), 

for each R-group of said library (600, 670). 
The method of claim 21, wherein all members of an R-group of said library 
are flagged if said R-group is not involved in said isomorphism of step (b) of 
claim 14. 

The method of daim 21, wherein said chemical structures of claim 1 
matching the query are stored as a sub-library corresponding to a Markush 
type formula made of said scaffold of claim 14, ail members of R-groups not 
associated to said query and said flagged members of said associated R- 
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groups (690), if all said associated R-groups have at least one member 
flagged (680). 

The method of claim 23, wherein said flagged members that match said sub- 
query are kept in a list for said isomorphism searching as IDs pointing to 
graphs. 

The method of claim 21 , wherein the association of atoms in said query with 
atoms in said scaffold is saved, defining the partial localisation of said query 
on the sub-library. 

The method of claim 21, wherein a same list of members is used for different 
R-groups of said library sharing the same members. 

The method of claim 21, wherein said sub-query isomorphism searching of 
said step (b) comprises the steps of: 

(1) Building said sub-query to be searched in said 
associated R-group (620), 

(2) Determining attachment point's constraints (620), 

(3) Isomorphism searching (640, 700 to 730) with said 
attachment points' constraints for each said associated 
R-group's member (630, 660). 

The method of claim 27, wherein graph connectivity of said sub-query is 
checked in step (1), meaning that atoms associated to a given R-group make 
a connected graph. 

The method of claim 27, wherein said isomorphism searching of said step (3) 
Is partially relaxed or not (720 or 710). 

The method of claim 27, wherein said determining of attachment points' 
constraints of said step (2) is defined as follows: 

(i) For each neighbour C[Q of order i of said R-group in 
said scaffold, if said neighbour is associated to an atom 
of said query then D[i] represents said atom in said 
query, otherwise D[i]=0, 

(ii) For each said order i, if D[i] is defined then for each of 
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the neighbour of Dp] in said query, if said neighbour is 
mapped to said R-group, A|T| represents said 
neighbour, otherwise A[i] is not defined (A[i]=0), 

(iii) The array A represents the constraints of said 
attachment points. 

The method of claim 27, wherein said isomorphism searching of said step (3) 
comprises the steps of: 

(a) Reading said member (630), 

(b) Searching of all the isomorphisms of said sub-query 
(640, 700 to 730) on said member with said constraints 
on attachment points: said atom A[r| of said su(>query 
must be mapped to the attachment point of order i of 
said member, for each i where A[i] is defined. 

The method of claim 31 ^ wherein the number of isomorphisms is counted in 
said step (b). 

The method of step 31, wherein only the first isomorphism is searched in said 
step (b). 

The method of claim 31, wherein said method further comprises the step of 
saving ail the isomorphism's descriptions, which defines, along with said 
partial localisation, the exact localisation of said query on said library. 

The method of claim 31, wherein said searching of said Isomorphisms (640, 
700 to 730) of said step (b) comprises the additional steps of: 

(i) Analysing each said member for the presence of a 
nested R-group (700), 

(ii) Proceeding recursively to claim 14 (720=240) with said 
query of said claim 14 corresponding now to said sub- 
query, said scaffold of said claim 14 corresponding 
now to said R-group and said R-groups are the said 
nested ones, until said nested R-groups are no more 
involved in an isomorphism, if said member contains a 
nested R-group (700). 
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36. The method of claim 3, wherein said data retrieval of said step (iii) retrieves at 
least one of the following information: 

• For the entire said database: 

o Said database contains or does not contain said query 
5 or is there at least one said library that contains said 

query^ 

o A list of all the combinatorial libraries containing said 
query, 

o A list of all the combinatorial libraries not containing 
10 said query, 

o A list and number of said scaffolds containing entirely 
said query, 

o A list and number of said scaffolds not containing 
entirely said query, 

15 o A list and number of said R-groups containing entirely 

said query whether nested R-groups are allowed or 
not, 

o A list and number of said R-groups not containing 
entirely said query whether nested R-groups are 
20 allowed or not. 

o The total number of isomorphisms retrieved in step (b) 
of claim 14 (310) for all the libraries, whether said 
associated R-groups of claim 15 have at least one 
member flagged during said processing of said step (4) 
25 or not (540, 680), 

o The global or partial localisation for all the 
isomorphisms^ 

o The first isomorphism found with or without its global or 
partial localisations, 

30 ♦ For each said library: 
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o Said library contains or does not contain said query, 

o A list and number of all the enumerated (specific) 
structures or non-enumerated structures of said library 
matching said query, 

o The number of unique structures of said library 
matching said query, whatever the number of partial 
localisations of said query on said library, 

o The number of times said query Is located on said 
scaffold only, or on said R-groups only, or spans 
across said scaffold and said R-group(s). This 
corresponds to the number of global localisations, 

o The total number of isomorphisms retrieved in step (b) 
of claim 14, whether said associated R-groups of claim 
15 have at least one member flagged during said 
processing of step (4) or not. This corresponds to the 
total of the number of said partial localisations of said 
query on said library, 

o A list of all said partial localisations of said query on 
said library, each one corresponding to an 
isomorphlsnn and defining a sub-library. 

For each said R-group: 

o Said R-group contains or does not contain said query 
or said sub-query, 

o A list and number of all the specific members or non- 
enumerated members of said R-group containing said 
query or said sub-query, whether nested R-groups are 
allowed or not, 

o A list and number of all the specific members or non- 
enumerated members of said R-group not containing 
said query or said sub-query, whether nested R-groups 
are allowed or not. 
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o The number of times said query or sub-query Is found 
in said R-group's members whettier exact localisation 
or nested R-groups are taken into account or not. Tliis 
corresponds to the total number of isomorphisms for all 
said R-group's members. 

For each said member of said R-group: 

o Said member contains or does not contain said query 
or said sub-query, 

o The number of times said query or sub-query is found 
on said member whether nested R-groups are taken 
into account or not. This corresponds to the number of 
isomorphisms of said sub-query on said member, 

o A list and number of all the specific structures or non- 
enumerated structures described by said member 
containing said query or said sub-query if said member 
contain nested R-group(s), 

o The exact localisation of said query or sub-query on 
said member. 

For each single isomorphism of said query or sub-query: 
o The library corresponding to said isomorphism, 

o A list and number of R-groups associated to at least 
one atom of said query in said isomorphism, 

o A list and number of R-groups not associated to any of 
the atoms of said query in said isomorphism, 

o A list and number of members containing said query or 
said sub-query for each said R-group, 

o A list and number of members not containing said 
query or said sub-query for each said R-group, 

o The global localisation of said query on said library, i.e. 
said query is either only on the scaffold, or either only 
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on one R-group or either on the 3caifold and at least 
one R-group, 

o The partial localisation of said query on said library, i.e. 
the atoms in the scaffold and the R-group(s) to which 
atoms in the query are mapped, 

o A list of all the specific structures or non-enumerated 
structures containing said query and mapping on said 
library following said partial localisation^ 

• For all the isomorphisms of said query or sub-query: 

o All the information gathered in the aforementioned 
points. 

The method of any of claims 1 or 36, wherein said data retrieval of said step 
(iii) retrieves said structures in the form of either enumerated or either non- 
enumerated structures. 

The method of any of claims 1 or 36, wherein said data retrieval of said step 
<iii) takes into aocounts nested R-groups. 

The method of any of claims 1 or 36, wherein said data retrieval of said step 
(iii) takes into account the exact localisation of said query for each said 
isomorphism. 

The method of any of the preceding claims, wherein soreening technique(s) 
option(s) is applied, thereby reducing searching time. 

The method of claim 40, wherein said screening technique option relies on 
substructure! features such as keys. 

The method of any of the preceding claims, wherein it can be integrated in a 
pipeline. 

A computer program for accomplishing the automatic identification of all the 
chemical structures defined by a Markush type fbrmula(e), which is stored in 
a database matching a given query structure, without the necessity of 
generating said chemical structures, comprising computer code means 
adapted to perform all steps according to any of the preceding claims when 
said program is run on a computer. 
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44. The computer program of any of the preceding claims embodied on a 
computer readable medium. 

45. A computer readable medium having a program recorded thereon, where the 
program is to make the computer to carry out the method according to any of 
the preceding claims. 

46. A computer program product stored on a computer usable medium, 
comprising a computer readable program means for causing the computer to 
identify all the chemical structures defined by a Markush type formula, which 
is stored in a database matching a given query structure, without the 
necessity of generating said chemical structures, according to any of the 
preceding claims. 

47. A computer loadable product directly loadable into the internal memory of a 
digital computer, comprising software code portions for performing the 
method of any of the preceding claims when said product is run on a 
computer. 

48. An apparatus for carrying out the method of any of the preceding claims 
Including data input means for inserting said at least one given query 
structure characterized in that there are provided means for carrying out the 
method of any of the preceding claims. 

49. Drug compound obtained by synthesising a molecule determined by 
performing the method according to any of the preceding claims. 
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